Text Mining Based on Self-Organizing Map Method for Arabic-English Documents
نویسندگان
چکیده
Computer information and retrieval is becoming increasingly sophisticated and is being exploited in more and more spheres of human activity. Many computer applications are developed as information distribution systems, of which the Internet is one of the best known and widely used. With enormous quantities of data in different languages available on the net, it is essential that more efficient methods of language data extraction are daveloped. Thus this paper is focused on text mining multilingual datasets. Arabic is a highly derivated and inflected language, requiring proper morphological analysis for effective text mining, and yet no standard approach to word stemming has emerged. This work is an attempt towards the development of a tool useful in the analysis of Arabic-English texts, and is achieved through the multilingual text mining (MTM) of a combined Arabic-English corpus. This project is based on SelfOrganizing Map (SOM) and uses an Arabic-English text corpus as the test-bed. Issues related to Arabic-English text mining, stemming and clustering are discussed in this paper. To the author’s knowledge, there is no significant literature available regarding SOM techniques applied to Arabic-English language text mining. In this work a framework and the outcome of its implementation is presented.
منابع مشابه
A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps
With the increasing amount of multilingual texts in the Internet, multilingual text retrieval techniques have become an important research issue. However, the discovery of relationships between different languages remains an open problem. In this paper we propose a method, which applied the growing hierarchical self-organizing map (GHSOM) model, to discover knowledge from multilingual text docu...
متن کاملTowards Multilingual Information Discovery through a SOM based Text Mining approach
Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus on processing monolingual documents (particularly English documents) only, little attention has been paid to apply the techniques to handle the document...
متن کاملText Mining with the WEBSOM
The emerging eld of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps a...
متن کاملMining massive document collections by the WEBSOM method
A viable alternative to the traditional text-mining methods is the WEBSOM, a software system based on the Self-Organizing Map (SOM) principle. Prior to the searching or browsing operations, this method orders a collection of textual items, say, documents according to their contents, and maps them onto a regular twodimensional array of map units. Documents that are similar on the basis of their ...
متن کاملFive-Dimensional Sentiment Analysis of Corpora, Documents and Words
Sentiment analysis has become a widely used approach to assess the emotional content of written documents such as customer feedback. In positive psychology research, the typical one-dimensional analysis framework has been extended to include five dimensions. This five-dimensional model, PERMA, enables a fine-grained analysis of written texts. We propose an approach in which this model, statisti...
متن کامل